
M51 galaxy Image source
In the vast expanse of the cosmos, galaxies stand as the cosmic building blocks, each holding secrets that illuminate our understanding of the universe. But amidst this cosmic tapestry lies a puzzle waiting to be solved, a classification system that unlocks the secrets of these celestial entities.
Galaxies, sprawling collections of stars, gas, dust, and dark matter, come in a myriad of shapes, sizes, and structures. From majestic spirals with their sweeping arms to enigmatic ellipticals, galaxies captivate astronomers with their diversity. But understanding this diversity requires more than just appreciation; it necessitates organization, a way to categorize and classify these celestial giants. The genesis of galaxy classification traces back to the early 20th century, when astronomers like Edwin Hubble and others embarked on a quest to categorize galaxies. Their efforts birthed what is now known as the Hubble sequence, a classification scheme that divides galaxies into three main types: spirals, ellipticals, and irregulars. Spiral galaxies, with their pinwheel-like arms, showcase ongoing star formation and dynamic galactic disks. Elliptical galaxies, on the other hand, exhibit a smooth, football-like shape, hinting at their older stellar populations and minimal gas and dust content. Irregular galaxies defy convention, with chaotic, asymmetrical shapes born from galactic collisions and interactions.
But why classify galaxies? What purpose does it serve beyond organizing celestial objects into neat categories? The answer lies in the insights it offers into galactic evolution, formation, and the very nature of the universe itself.Galaxy classification provides a window into the evolutionary pathways of these cosmic entities. Spiral galaxies, for instance, are often sites of active star formation, fueled by the presence of gas and dust in their disks. Meanwhile, elliptical galaxies, with their lack of prominent spiral arms, suggest a different evolutionary history, dominated by stellar aging and galactic mergers. Furthermore, galaxy classification sheds light on the processes that govern the formation of these cosmic structures. Spirals, believed to form from the gravitational collapse of gas clouds, represent a different formation mechanism from ellipticals, which likely arise from the merger of smaller galaxies. Irregular galaxies, with their chaotic shapes, offer clues to the role of galactic interactions in shaping the cosmic landscape. Beyond individual galaxies, classification enables astronomers to map the cosmic web, the intricate network of filaments and voids that weave through the universe. By studying the distribution of galaxy types across space, astronomers gain insights into the large-scale structure of the cosmos and the forces that shape it.
In the grand tapestry of the universe, galaxy classification serves as a guiding thread, weaving together our understanding of cosmic evolution, formation, and structure. From the majestic spirals to the enigmatic ellipticals, galaxies offer a glimpse into the cosmic continuum, where past, present, and future merge in a celestial dance of cosmic proportions.
For more information about galaxy classification, wikipedia is a good start !
In this project we will explore some techniques to classify the types of galaxies. Using the dataset (discribed in the next chapter) we will go thrugh data exploration, data transformation and a couple of models. In particular we will focus on predicting these 3 galaxy types:
The outputs of this projects are :
The dataset contains images from SDSS originally ment for the Galaxy Zoo 2 project. During this project people manually voted on the galaxy classification for the complete dataset.
The images are available in different shapes:
An additional csv file contains the classification and some additional informations:
As we will see in the EDA the agreement is not always reliable. That means that our ground truth is somehow "messy".
# General purpose libraries
import polars as pl
import numpy as np
import warnings
import math
from PIL import Image
from pathlib import Path
from multiprocessing import Pool
# Plotting libraries
import plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
# Model
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from keras import backend as K
# Merics
from sklearn.metrics import (confusion_matrix,
accuracy_score,
f1_score,
recall_score,
precision_score)
2024-04-24 23:11:50.746192: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-04-24 23:11:50.746289: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-04-24 23:11:50.886135: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
# Setting the plotly theme
pio.templates.default = 'plotly_white'
# Filter warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# Define categories and encodding
CATS = ['E', 'S', 'SB']
CATS_TO_IDX = {v: k for k, v in enumerate(CATS)}
IDX_TO_CATS = {k: v for k, v in enumerate(CATS)}
# Define categories colors
CATS_COLORS = ['red', 'blue', 'green']
COLOR_MAP = {k: v for k, v in zip(CATS, CATS_COLORS)}
#Define images shapes
SHAPES = ['small', 'medium', 'large']
# Data preparation
TEST_FRAC = 0.3
# Models params
IMG_PARAMS = {
'small':{
'height': 69,
'width': 69,
'channels': 3,
},
'medium':{
'height': 227,
'width': 227,
'channels': 3
},
'large':{
'height': 299,
'width': 299,
'channels': 3
}
}
TRAIN_PATH = Path('/kaggle/working/train')
TEST_PATH = Path('/kaggle/working/test')
AUTOTUNE = tf.data.AUTOTUNE
BATCH_SIZE = 32
TRAIN_TEST_SPLIT = 0.3
EPOCHS = 30
## RUN only
RUN_ONLY = False
project_path = Path('/kaggle/input/resized-reduced-gz2-images')
path_69 = project_path.joinpath('images_E_S_SB_69x69_a_03')
path_227 = project_path.joinpath('images_E_S_SB_227x227_a_03')
path_299 = project_path.joinpath('images_E_S_SB_299x299_a_03')
path_csv = project_path.joinpath('3class_map_a(p).csv')
df = pl.read_csv(path_csv)
df.head()
| dr7objid | asset_id | gz2class | total_classifications | total_votes | agreement | |
|---|---|---|---|---|---|---|
| i64 | i64 | i64 | str | i64 | i64 | f64 |
| 0 | 587732591714893851 | 58957 | "Sc+t" | 45 | 342 | 1.0 |
| 1 | 588009368545984617 | 193641 | "Sb+t" | 42 | 332 | 1.0 |
| 2 | 587732484359913515 | 55934 | "Ei" | 36 | 125 | 0.384527 |
| 3 | 587741723357282317 | 158501 | "Sc+t" | 28 | 218 | 0.766954 |
| 4 | 587738410866966577 | 110939 | "Er" | 43 | 151 | 0.399222 |
df.describe()
| statistic | dr7objid | asset_id | gz2class | total_classifications | total_votes | agreement | |
|---|---|---|---|---|---|---|---|
| str | f64 | f64 | f64 | str | f64 | f64 | f64 |
| "count" | 206168.0 | 206168.0 | 206168.0 | "206168" | 206168.0 | 206168.0 | 206168.0 |
| "null_count" | 0.0 | 0.0 | 0.0 | "0" | 0.0 | 0.0 | 0.0 |
| "mean" | 103083.5 | 5.8782e17 | 141056.802763 | null | 42.622196 | 184.118694 | 0.43431 |
| "std" | 59515.719487 | 1.8327e14 | 81082.330522 | null | 5.910699 | 63.328847 | 0.28728 |
| "min" | 0.0 | 5.8772e17 | 3.0 | "Ec" | 16.0 | 32.0 | 0.0 |
| "25%" | 51542.0 | 5.8773e17 | 71370.0 | null | 39.0 | 141.0 | 0.17935 |
| "50%" | 103084.0 | 5.8774e17 | 139530.0 | null | 43.0 | 160.0 | 0.456436 |
| "75%" | 154625.0 | 5.8774e17 | 210903.0 | null | 46.0 | 207.0 | 0.632321 |
| "max" | 206167.0 | 5.8885e17 | 295305.0 | "Sd?t(r)" | 79.0 | 604.0 | 1.0 |
Looking at the dataframe summary, we can see that we don't have any null counts, so we won't have to worry about that. Let's take a look to two interesting columns:
let's take a look to these columns visually:
x_votes, y_votes = np.histogram(df['total_votes'], bins=10)
x_agree, y_agree = np.histogram(df['agreement'], bins=10)
fig1 = px.histogram(df, x='total_votes', title="Total votes histogram")
fig1.show()
fig2 = px.histogram(df, x='agreement', title="Agreement histogram")
fig2.show()
We can also take a look to the number of unique values in the galaxy classification column gz2class
unique_class = df.n_unique(pl.col('gz2class'))
print(f'There are {unique_class} different galaxy categories')
There are 785 different galaxy categories
This is a lot ! This is because the classification system is a lot more complex (and messy) than the image showed in introduction. Let's take for example some galaxy class like Sb
df.filter(pl.col('gz2class').str.starts_with('Sb')).unique('gz2class').head(10)
| dr7objid | asset_id | gz2class | total_classifications | total_votes | agreement | |
|---|---|---|---|---|---|---|
| i64 | i64 | i64 | str | i64 | i64 | f64 |
| 1650 | 587739827667402818 | 142832 | "Sb2l(l)" | 48 | 350 | 0.25661 |
| 9466 | 587729387147427913 | 34702 | "Sb1l(d)" | 39 | 284 | 0.154263 |
| 643 | 588017978367934481 | 226963 | "Sb?l(i)" | 34 | 244 | 0.082038 |
| 3140 | 587739379918372992 | 124278 | "Sb3t(o)" | 50 | 414 | 1.0 |
| 1368 | 587732050024595525 | 50108 | "Sb4m" | 23 | 178 | 1.0 |
| 1223 | 587731890842370258 | 49028 | "Sb2l(i)" | 34 | 224 | 0.045566 |
| 9389 | 587733080808751190 | 64368 | "Sb2t(d)" | 35 | 276 | 0.795675 |
| 58315 | 588023047474249814 | 235667 | "Sb+m(l)" | 58 | 214 | 0.349204 |
| 17132 | 588017719576559713 | 220028 | "Sb3m(i)" | 41 | 325 | 1.0 |
| 1292 | 587738410323214343 | 110584 | "Sb2m(m)" | 46 | 275 | 0.0 |
As mentionned in the introduction, we will foccus on high level classification: eliptical (E), normal spiral (S) and barred spiral (SB) galaxies. As the data is already sorted in folders into the dataset, we will get the classification information from the dataset directly
Here we will infer the ground truth from the dataset folders
def itterate_folder(folder):
img_type = '.jpg'
files = []
for sub_folder in folder.iterdir():
for active_folder in sub_folder.iterdir():
files += [f for f in active_folder.iterdir() if f.suffix == img_type]
return files
def build_df(files):
path = list(map(str, files))
asset_id = list(map(lambda x: x.stem, files))
return pl.DataFrame(dict(path=path, asset_id=asset_id))
small_files = build_df(itterate_folder(path_69)).rename({'path': 'path_small'})
medium_files = build_df(itterate_folder(path_227)).rename({'path': 'path_medium'})
large_files = build_df(itterate_folder(path_299)).rename({'path': 'path_large'})
images_df = (
small_files
.join(medium_files, on='asset_id')
.join(large_files, on='asset_id')
.with_columns(
target=pl.col('path_small').str.split("/").list.get(-2)
)
.select(['asset_id', 'target', 'path_small', 'path_medium', 'path_large'])
)
df = (
df
.with_columns(pl.col('asset_id').cast(pl.Utf8))
.join(images_df, on='asset_id', how='left')
)
df.describe()
| statistic | dr7objid | asset_id | gz2class | total_classifications | total_votes | agreement | target | path_small | path_medium | path_large | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| str | f64 | f64 | str | str | f64 | f64 | f64 | str | str | str | str |
| "count" | 206168.0 | 206168.0 | "206168" | "206168" | 206168.0 | 206168.0 | 206168.0 | "133812" | "133812" | "133812" | "133812" |
| "null_count" | 0.0 | 0.0 | "0" | "0" | 0.0 | 0.0 | 0.0 | "72356" | "72356" | "72356" | "72356" |
| "mean" | 103083.5 | 5.8782e17 | null | null | 42.622196 | 184.118694 | 0.43431 | null | null | null | null |
| "std" | 59515.719487 | 1.8327e14 | null | null | 5.910699 | 63.328847 | 0.28728 | null | null | null | null |
| "min" | 0.0 | 5.8772e17 | "100" | "Ec" | 16.0 | 32.0 | 0.0 | "E" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… |
| "25%" | 51542.0 | 5.8773e17 | null | null | 39.0 | 141.0 | 0.17935 | null | null | null | null |
| "50%" | 103084.0 | 5.8774e17 | null | null | 43.0 | 160.0 | 0.456436 | null | null | null | null |
| "75%" | 154625.0 | 5.8774e17 | null | null | 46.0 | 207.0 | 0.632321 | null | null | null | null |
| "max" | 206167.0 | 5.8885e17 | "99999" | "Sd?t(r)" | 79.0 | 604.0 | 1.0 | "SB" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… |
They are some missing images in the dataset in comparison to the csv file. We can drop these lines in the dataframe as we don't have any images.
df = df.drop_nulls()
print(f'The new dataset has {df.shape[0]} lines')
The new dataset has 133812 lines
Now that we have defined the targets in the dataset, we can take a look to the balance and distribution of the targets
plot_df = df.group_by('target').len().sort('target')
fig = px.bar(plot_df,
x='target',
y='len',
title='Images count per target',
height=500,
width=500,
color='target',
color_discrete_map=COLOR_MAP,
opacity=0.6
)
fig.update_layout(showlegend=False)
if not RUN_ONLY:
fig.show()
The dataset is not balanced. We will need to take this into account for the data pre processing. We can also check for the agreement and total votes histograms for the targets.
fig = go.Figure()
for cat, color in zip(CATS, CATS_COLORS):
x = df.filter(pl.col('target') == cat)['total_votes']
fig.add_trace(go.Histogram(x=x,
name=cat,
marker_color=color,
opacity=0.6
)
)
fig.update_layout(barmode='overlay',
title="Total votes histogram per target"
)
if not RUN_ONLY:
fig.show()
fig = go.Figure()
for cat, color in zip(CATS, CATS_COLORS):
x = df.filter(pl.col('target') == cat)['agreement']
fig.add_trace(go.Histogram(x=x,
name=cat,
marker_color=color,
opacity=0.6
)
)
fig.update_layout(barmode='overlay',
title="Agreement histogram per target"
)
if not RUN_ONLY:
fig.show()
We have a lot less agreement for SB and S galaxies. We'll see if the computer is this has an impact on the model or not.
In this part we will explore the images
def load_image(image_obj, as_image=False):
"""Load an image from path or str"""
if isinstance(image_obj, (Path, str)):
img = Image.open(image_obj)
if not as_image:
img = np.array(img)
elif isinstance(image_obj, np.ndarray):
img = image_obj
if as_image:
raise ValueError
else:
img = None
return img
def plot_images(images, subtitles=None, title=None, maxrow = 5, grayscale=False, **kwargs):
"""Utility to plot images"""
images = list(map(load_image, images))
if len(images) <= maxrow:
m = len(images)
n = 1
else:
m = maxrow
n = math.ceil(len(images)/maxrow)
fig = make_subplots(n,
m,
subplot_titles=subtitles,
horizontal_spacing=0.05,
vertical_spacing=0.05 )
update_args = {}
for idx, image in enumerate(images):
row, col = divmod(idx, maxrow)
if grayscale:
trace = go.Heatmap(z=image,
zmin=kwargs.get('zmin', 0),
zmax=kwargs.get('zmin', 255),
coloraxis='coloraxis'
)
if idx == 0:
ref = ''
else:
ref = str(idx+1)
update_args[f'yaxis{ref}_scaleanchor'] = f"x{ref}"
else:
trace = go.Image(z=image)
fig.add_trace(trace,
row+1,
col+1,
)
if grayscale:
fig.update_layout(coloraxis = {'colorscale': kwargs.get('colorscale', 'viridis')})
fig.update_layout(**update_args)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_layout(title=title,
height=kwargs.get('height', 400),
width=kwargs.get('width', 1200),
autosize=kwargs.get('autosize', False))
fig.update_yaxes(showticklabels=False)
fig.update_xaxes(showticklabels=False)
return fig
seed = 42
pictures = []
captions = []
for cat in CATS:
s = (
df
.filter(pl.col('target') == cat)
.sample(1, seed=seed)
.select(['dr7objid', 'target', 'path_small', 'path_medium', 'path_large'])
.transpose()
.get_column("column_0")
.to_list()
)
pictures.extend(s[2:])
captions.extend(map(lambda x: f'{s[0]} \n\n Cat: {s[1]}, shape: {x}', ['69x69', '227x227', '299x299']))
fig = plot_images(pictures, captions, title="Images overview", maxrow=3, width=1500, height=800)
if not RUN_ONLY:
fig.show()
The difference between the quality is visible, as well as the difference between the different types. Let's show some samples of every category:
samples = 3
figs = []
for cat in CATS:
s = (
df
.filter(pl.col('target') == cat)
.sample(samples, seed=seed)
.select(['dr7objid', 'path_large'])
.to_dict(as_series=False)
)
pictures=s['path_large']
captions=s['dr7objid']
fig = plot_images(pictures,
captions,
title=f"Images overview: {cat}",
maxrow=3,
width=1500,
height=800)
figs.append(fig)
for fig in figs:
if not RUN_ONLY:
fig.show()
A few obersvations here:
Now let's take a look to the images themself and extract some usefull information
def process_one(data):
img = Image.open(data['path'])
img_l = img.convert('L')
img_arr = np.array(img)
img_arr_l = np.array(img_l)
means = img_arr.mean(axis=(0, 1))
mean_l = img_arr_l.mean()
return (data['target'], *means, mean_l, img_arr_l)
def batch_process(files):
with Pool() as pool:
results = pool.map(process_one, files)
return results
def process_results(data):
results = {}
results['r_mean'] = list(map(lambda x: x[1], data))
results['g_mean'] = list(map(lambda x: x[2], data))
results['b_mean'] = list(map(lambda x: x[3], data))
results['l_mean'] = list(map(lambda x: x[4], data))
results['p_mean'] = np.mean(np.stack(list(map(lambda x: x[5], data)), axis=2), axis=2)
return results
%%time
if not RUN_ONLY:
files = df.select(['target', 'path_small']).rename({'path_small': 'path'}).to_dicts()
results = batch_process(files)
imgs_stats = {}
for cat in CATS:
data = list(filter(lambda x: x[0] == cat, results))
imgs_stats[cat] = process_results(data)
CPU times: user 4.78 s, sys: 2.05 s, total: 6.83 s Wall time: 3min 30s
if not RUN_ONLY:
images = list(map(lambda x: imgs_stats[x]['p_mean'], CATS))
captions = list(map(lambda x: f'Category: {x}', CATS))
plot_images(images,
subtitles=captions,
title="Mean gray values",
maxrow = 3,
grayscale=True,
width=1200,
zmin=0,
zmax=190,
colorscale='Turbo'
).show()
The plot above shows the mean value of each pixel (in grayscale) from the complete dataset for each category. The galaxies are well centered into the images. We can also notice that the center of elliptic galaxies seems to be brighter then the others. It seems that the orientation of the images is always different.
Now let's look to the different color channels:
def plot_color_hists(color, imgs_stats, title=''):
fig = go.Figure()
traces = []
for cat in CATS:
x = imgs_stats[cat][color]
trace = go.Histogram(x=x,
name=cat,
marker_color=COLOR_MAP[cat],
opacity=0.6,
histnorm='percent'
)
traces.append(trace)
fig.add_traces(traces)
fig.update_layout(barmode='overlay',
title=title)
return fig
if not RUN_ONLY:
plot_color_hists('r_mean', imgs_stats, title="Color histogram, channel: R").show()
if not RUN_ONLY:
plot_color_hists('g_mean', imgs_stats, title="Color histogram, channel: G").show()
if not RUN_ONLY:
plot_color_hists('b_mean', imgs_stats, title="Color histogram, channel: B").show()
if not RUN_ONLY:
plot_color_hists('l_mean', imgs_stats, title="Color histogram, channel: L").show()
Be carefull when reading the last plots, the colors on the plot do not represent the channels but the galaxy category, there is one plot per channel Red, Green, Blue, Luminance (grayscale). Once again the S and SB galaxies looks very similar.
As we also saw in the images previews, the images are very dark. We can see this also in each channels histogram.
For this step I decided to use downsampling and data augmentation. We will first select some data and make a simlink to a work folder to access them with keras.preprocessing tool.
def symlink_files(dests, source, shape, cat):
#cat_idx = str(CATS_TO_IDX[cat])
cat_idx = cat
source_path = source.joinpath(shape).joinpath(cat_idx)
source_path.mkdir(exist_ok=True, parents=True)
dests = map(lambda x: Path(x), dests)
for f in dests:
filename = f.name
source = source_path.joinpath(filename)
source.symlink_to(f)
def generate_infra(df, source, n_samples, test_frac=TEST_FRAC):
train_frac = 1 - test_frac
train_size = np.ceil(train_frac*n_samples).astype(int)
test_size = n_samples - train_size
folder_col = ['train']*train_size + ['test']*test_size
for cat in CATS:
df_res = (
df
.filter(pl.col('target') == cat)
.sort('agreement', descending=True)
.sample(n=n_samples, shuffle=True, seed=45)
.select(['path_small', 'path_medium', 'path_large'])
.with_columns(pl.Series(name='dest_folder', values=folder_col))
)
for folder in ['test', 'train']:
folder_source = source.joinpath(folder)
data = (
df_res
.filter(pl.col('dest_folder') == folder)
.to_dict(as_series=False)
)
for shape in SHAPES:
dests = data.get('path_' + shape)
symlink_files(dests, folder_source, shape, cat)
wk_path = Path('/kaggle/working/')
n_samples = df.filter(pl.col('target') == 'SB').shape[0]
try:
generate_infra(df, wk_path, n_samples)
except FileExistsError:
pass
for shape in SHAPES:
p = TRAIN_PATH.joinpath(shape)
for cat in p.iterdir():
files = [f for f in cat.iterdir()]
l = len(files)
s = files[0].is_symlink()
print(f'The dataset {shape} for category {cat.name} has {l} files. Symlink is {s}')
The dataset small for category E has 6982 files. Symlink is True The dataset small for category S has 6982 files. Symlink is True The dataset small for category SB has 6982 files. Symlink is True The dataset medium for category E has 6982 files. Symlink is True The dataset medium for category S has 6982 files. Symlink is True The dataset medium for category SB has 6982 files. Symlink is True The dataset large for category E has 6982 files. Symlink is True The dataset large for category S has 6982 files. Symlink is True The dataset large for category SB has 6982 files. Symlink is True
Here we will define some functions to build the model that we can reuse for several models.
Here we define functions to load dataset, augment the data and rescale it
def data_augmentation(imgs, augmentation_layers):
"""Utility to augment the data with a list of layers"""
for layer in augmentation_layers:
imgs = layer(imgs)
return imgs
def load_ds(dataset_name,
validation_split=TRAIN_TEST_SPLIT,
tune_performance=True,
rescale=False,
augmentation_layers = None
):
"""Utility to load training and validation dataset"""
params = IMG_PARAMS.get(dataset_name)
data_dir = TRAIN_PATH.joinpath(dataset_name)
train_ds, val_ds = keras.utils.image_dataset_from_directory(
directory=data_dir,
labels='inferred',
color_mode='rgb',
batch_size=BATCH_SIZE,
image_size=(params['height'], params['width']),
validation_split=validation_split,
subset='both',
shuffle=True,
seed=42
)
if tune_performance:
train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
if augmentation_layers is not None:
train_ds = train_ds.map(
lambda x, y: (data_augmentation(x, augmentation_layers), y),
num_parallel_calls=AUTOTUNE
)
if rescale:
normalization_layer = layers.Rescaling(1./255)
train_ds = train_ds.map(lambda x, y: normalization_layer(x), y)
val_ds = val_ds.map(lambda x, y: normalization_layer(x), y)
return train_ds, val_ds
Here we will build model generator for different model architectures:
(modified version from the keras example page)
def cnn_model(dataset_name,
sizes,
dropout = 0.2,
rescaling = False,
augmentation_layers = None
):
"""
Build a CNN model
"""
params = IMG_PARAMS.get(dataset_name)
input_shape = (params['width'], params['height'], params['channels'])
stack = []
stack.append(layers.Input(shape=input_shape, name='Input image'))
if augmentation_layers is not None:
for l in augmentation_layers:
stack.append(l)
if rescaling:
stack.append(layers.Rescaling(1.0/255, name='rescaling'))
for i, size in enumerate(sizes):
conv = layers.Conv2D(size,
3,
padding='same',
activation='relu',
name=f'convolution_{i}'
)
stack.append(conv)
stack.append(layers.MaxPooling2D(name=f'Pooling_{i}'))
stack.append(layers.Dropout(dropout, name='Dropout'))
stack.append(layers.Flatten(name='Flatten'))
dense_1 = layers.Dense(sizes[-1]*2,
activation='relu',
name='Dense_1'
)
stack.append(dense_1)
dense_2 = layers.Dense(3,
name='Outputs'
)
stack.append(dense_2)
model = keras.Sequential(stack)
return model
def xception_model(dataset_name,
sizes,
dropout = 0.2,
rescaling = False,
augmentation_layers = None
):
params = IMG_PARAMS.get(dataset_name)
input_shape = (params['width'], params['height'], params['channels'])
x = layers.Input(shape=input_shape, name='Input image')
inputs = x
if augmentation_layers is not None:
for l in augmentation_layers:
x = l(x)
if rescaling:
x = layers.Rescaling(1.0/255, name='Rescaling')(x)
x = layers.Conv2D(sizes[0], 3, strides=2, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous = x
for size in sizes[1:-1]:
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(size, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
residual = layers.Conv2D(size, 1, strides=2, padding="same")(previous)
x = layers.add([x, residual])
previous = x
x = layers.SeparableConv2D(sizes[-1], 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(dropout)(x)
outputs = layers.Dense(3, activation=None)(x)
return keras.Model(inputs, outputs)
class TimestampCallback(tf.keras.callbacks.Callback):
def __init__(self, metric_name="duration"):
self.__epoch_start = None
self.__metric_name = metric_name
def on_epoch_begin(self, epoch, logs=None):
self.__epoch_start = tf.timestamp()
def on_epoch_end(self, epoch, logs=None):
logs[self.__metric_name] = tf.timestamp() - self.__epoch_start
def plot_convergence(history,
title = "Training stats",
train_only = ['learning_rate', 'duration'],
include_training_time = True
):
"""
Plot the training stats
"""
if not isinstance(history, dict):
hist = history.history
else:
hist = history
metrics = list(hist.keys())
train_metrics = list(filter(lambda x: x[:4] != 'val_', metrics))
epochs = list(range(1, len(hist[metrics[0]]) + 1))
fig = make_subplots(
rows=1,
cols=len(train_metrics),
subplot_titles=train_metrics,
)
markers = ['triangle-up-open', 'circle-open']
colors = ['red', 'blue']
names = ['training', 'validation']
for idx, metric in enumerate(train_metrics):
show_legend = False
if not idx:
show_legend = True
for idy, data in enumerate([metric, 'val_' + metric]):
if data[4:] in train_only:
continue
y = hist.get(data)
fig.add_trace(
go.Scatter(
x=epochs,
y=y,
mode='lines+markers',
marker=dict(
symbol=markers[idy],
color=colors[idy],
size=8
),
name=names[idy],
showlegend=show_legend
),
row=1,
col=idx+1
)
if include_training_time:
time = hist.get("duration", [0])
tot_time = np.sum(time).astype(int)
title += f' Total training time: {tot_time} s'
if len(time) < EPOCHS:
title += " (Eearly stopping)"
title += '.'
fig.update_layout(
template='plotly_white',
title=title
)
return fig
def run_cnn(shape,
sizes,
dropout,
augmentation_layers=None,
optimizer=None,
metrics=['accuracy'],
loss=None,
epochs=EPOCHS,
callbacks=None,
return_model = False
):
"""
Build and run a model
"""
train, val = load_ds(shape)
model = cnn_model(shape,
sizes,
dropout=dropout,
rescaling=True,
augmentation_layers=augmentation_layers
)
if optimizer is None:
optimizer = Adam(0.001)
if loss is None:
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss,
metrics=metrics)
history = model.fit(train,
validation_data=val,
epochs=epochs,
callbacks=callbacks
)
if return_model:
return history.history, model, val
return history.history
image_aug_layers = [
layers.RandomRotation(0.2),
layers.RandomZoom(0.3)
]
sizes = [16, 16, 16, 32, 32, 64]
dropout = 0.3
reduce_lr = ReduceLROnPlateau(patience=1, factor=0.5, min_lr=1e-6)
timestp = TimestampCallback()
early = EarlyStopping(patience=10, restore_best_weights=False, verbose=1)
metrics = ['accuracy']
callbacks = [reduce_lr, timestp]
history_s = run_cnn('small', sizes=sizes, dropout=dropout, callbacks=callbacks)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/30
2024-04-24 23:16:33.627173: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9522: 4.12939, expected 3.20966
2024-04-24 23:16:33.627226: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9523: 5.91512, expected 4.99538
2024-04-24 23:16:33.627235: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9524: 5.91878, expected 4.99904
2024-04-24 23:16:33.627243: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9525: 6.10096, expected 5.18122
2024-04-24 23:16:33.627250: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9526: 5.50634, expected 4.58661
2024-04-24 23:16:33.627258: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9527: 5.67897, expected 4.75924
2024-04-24 23:16:33.627266: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9528: 6.38289, expected 5.46316
2024-04-24 23:16:33.627273: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9529: 5.33339, expected 4.41365
2024-04-24 23:16:33.627281: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9530: 5.37317, expected 4.45343
2024-04-24 23:16:33.627289: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9531: 4.1374, expected 3.21767
2024-04-24 23:16:33.628298: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[32,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:33.628326: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:33.628334: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:33.628341: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:33.628348: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:33.628361: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:16:33.802748: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9522: 4.12939, expected 3.20966
2024-04-24 23:16:33.802807: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9523: 5.91512, expected 4.99538
2024-04-24 23:16:33.802817: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9524: 5.91878, expected 4.99904
2024-04-24 23:16:33.802825: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9525: 6.10096, expected 5.18122
2024-04-24 23:16:33.802832: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9526: 5.50634, expected 4.58661
2024-04-24 23:16:33.802840: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9527: 5.67897, expected 4.75924
2024-04-24 23:16:33.802850: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9528: 6.38289, expected 5.46316
2024-04-24 23:16:33.802858: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9529: 5.33339, expected 4.41365
2024-04-24 23:16:33.802865: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9530: 5.37317, expected 4.45343
2024-04-24 23:16:33.802873: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9531: 4.1374, expected 3.21767
2024-04-24 23:16:33.803403: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[32,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:33.803438: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:33.803451: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:33.803463: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:33.803478: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:33.803495: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
24/459 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.3551 - loss: 1.0982
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1714000596.889938 84 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
450/459 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.4419 - loss: 1.0430
2024-04-24 23:16:41.440815: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19044: 3.18065, expected 2.73309
2024-04-24 23:16:41.440879: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19112: 2.93988, expected 2.49232
2024-04-24 23:16:41.440898: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19320: 3.4653, expected 3.01774
2024-04-24 23:16:41.440941: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 23736: 2.73582, expected 2.28826
2024-04-24 23:16:41.440986: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28566: 3.29053, expected 2.65257
2024-04-24 23:16:41.440997: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28567: 3.75902, expected 3.12106
2024-04-24 23:16:41.441007: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28568: 4.72746, expected 4.0895
2024-04-24 23:16:41.441019: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28569: 4.45585, expected 3.81789
2024-04-24 23:16:41.441032: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28570: 3.23928, expected 2.60131
2024-04-24 23:16:41.441044: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28571: 4.30002, expected 3.66206
2024-04-24 23:16:41.441066: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[7,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[7,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:41.441081: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:41.441092: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:41.441103: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:41.441115: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:41.441139: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:16:41.466337: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19044: 3.18065, expected 2.73309
2024-04-24 23:16:41.466380: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19112: 2.93988, expected 2.49232
2024-04-24 23:16:41.466398: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 19320: 3.4653, expected 3.01774
2024-04-24 23:16:41.466444: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 23736: 2.73582, expected 2.28826
2024-04-24 23:16:41.466505: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28566: 3.29053, expected 2.65257
2024-04-24 23:16:41.466526: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28567: 3.75902, expected 3.12106
2024-04-24 23:16:41.466541: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28568: 4.72746, expected 4.0895
2024-04-24 23:16:41.466554: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28569: 4.45585, expected 3.81789
2024-04-24 23:16:41.466567: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28570: 3.23928, expected 2.60131
2024-04-24 23:16:41.466582: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 28571: 4.30002, expected 3.66206
2024-04-24 23:16:41.466609: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[7,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[7,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:41.466626: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:41.466638: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:41.466648: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:41.466660: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:41.466676: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
459/459 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4429 - loss: 1.0422
2024-04-24 23:16:46.050366: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4761: 3.37658, expected 2.75903
2024-04-24 23:16:46.050422: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4762: 5.16998, expected 4.55243
2024-04-24 23:16:46.050432: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4764: 4.85262, expected 4.23507
2024-04-24 23:16:46.050440: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4765: 5.0726, expected 4.45505
2024-04-24 23:16:46.050447: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4766: 5.14326, expected 4.52571
2024-04-24 23:16:46.050455: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4769: 4.28812, expected 3.67057
2024-04-24 23:16:46.050462: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4770: 3.21708, expected 2.59953
2024-04-24 23:16:46.050470: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4771: 4.8195, expected 4.20195
2024-04-24 23:16:46.050478: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4773: 4.69615, expected 4.0786
2024-04-24 23:16:46.050485: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4774: 5.03699, expected 4.41944
2024-04-24 23:16:46.050500: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[11,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:46.050509: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:46.050516: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:46.050522: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:46.050529: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:46.050540: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:16:46.076532: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4761: 3.37658, expected 2.75903
2024-04-24 23:16:46.076579: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4762: 5.16998, expected 4.55243
2024-04-24 23:16:46.076588: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4764: 4.85262, expected 4.23507
2024-04-24 23:16:46.076596: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4765: 5.0726, expected 4.45505
2024-04-24 23:16:46.076609: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4766: 5.14326, expected 4.52571
2024-04-24 23:16:46.076617: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4769: 4.28812, expected 3.67057
2024-04-24 23:16:46.076625: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4770: 3.21708, expected 2.59953
2024-04-24 23:16:46.076638: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4771: 4.8195, expected 4.20195
2024-04-24 23:16:46.076645: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4773: 4.69615, expected 4.0786
2024-04-24 23:16:46.076653: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 4774: 5.03699, expected 4.41944
2024-04-24 23:16:46.076668: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[11,16,69,69]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,3,69,69]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:16:46.076676: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:16:46.076683: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:16:46.076689: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:16:46.076696: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:16:46.076706: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
459/459 ━━━━━━━━━━━━━━━━━━━━ 17s 21ms/step - accuracy: 0.4430 - loss: 1.0421 - val_accuracy: 0.5292 - val_loss: 0.9670 - learning_rate: 0.0010 - duration: 16.7463 Epoch 2/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5589 - loss: 0.9248 - val_accuracy: 0.5865 - val_loss: 0.8674 - learning_rate: 0.0010 - duration: 1.9970 Epoch 3/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.5962 - loss: 0.8624 - val_accuracy: 0.6097 - val_loss: 0.8344 - learning_rate: 0.0010 - duration: 1.9987 Epoch 4/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6146 - loss: 0.8297 - val_accuracy: 0.6015 - val_loss: 0.8359 - learning_rate: 0.0010 - duration: 2.0080 Epoch 5/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6405 - loss: 0.7855 - val_accuracy: 0.6350 - val_loss: 0.7835 - learning_rate: 5.0000e-04 - duration: 2.0912 Epoch 6/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6502 - loss: 0.7617 - val_accuracy: 0.6436 - val_loss: 0.7672 - learning_rate: 5.0000e-04 - duration: 2.4400 Epoch 7/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6657 - loss: 0.7388 - val_accuracy: 0.6491 - val_loss: 0.7624 - learning_rate: 5.0000e-04 - duration: 1.9893 Epoch 8/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6842 - loss: 0.7146 - val_accuracy: 0.6519 - val_loss: 0.7623 - learning_rate: 5.0000e-04 - duration: 1.9963 Epoch 9/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6964 - loss: 0.6814 - val_accuracy: 0.6615 - val_loss: 0.7546 - learning_rate: 2.5000e-04 - duration: 2.0438 Epoch 10/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7075 - loss: 0.6626 - val_accuracy: 0.6643 - val_loss: 0.7494 - learning_rate: 2.5000e-04 - duration: 2.0149 Epoch 11/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7164 - loss: 0.6485 - val_accuracy: 0.6546 - val_loss: 0.7622 - learning_rate: 2.5000e-04 - duration: 1.9803 Epoch 12/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7224 - loss: 0.6315 - val_accuracy: 0.6696 - val_loss: 0.7430 - learning_rate: 1.2500e-04 - duration: 1.9868 Epoch 13/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7270 - loss: 0.6208 - val_accuracy: 0.6720 - val_loss: 0.7410 - learning_rate: 1.2500e-04 - duration: 2.0440 Epoch 14/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7343 - loss: 0.6105 - val_accuracy: 0.6732 - val_loss: 0.7465 - learning_rate: 1.2500e-04 - duration: 2.0091 Epoch 15/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7389 - loss: 0.6021 - val_accuracy: 0.6737 - val_loss: 0.7450 - learning_rate: 6.2500e-05 - duration: 1.9618 Epoch 16/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7437 - loss: 0.5963 - val_accuracy: 0.6763 - val_loss: 0.7429 - learning_rate: 3.1250e-05 - duration: 1.9589 Epoch 17/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7504 - loss: 0.5892 - val_accuracy: 0.6793 - val_loss: 0.7467 - learning_rate: 1.5625e-05 - duration: 1.9530 Epoch 18/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7519 - loss: 0.5859 - val_accuracy: 0.6801 - val_loss: 0.7492 - learning_rate: 7.8125e-06 - duration: 1.9533 Epoch 19/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7488 - loss: 0.5850 - val_accuracy: 0.6801 - val_loss: 0.7499 - learning_rate: 3.9063e-06 - duration: 1.9859 Epoch 20/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7478 - loss: 0.5833 - val_accuracy: 0.6791 - val_loss: 0.7495 - learning_rate: 1.9531e-06 - duration: 2.0122 Epoch 21/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7480 - loss: 0.5848 - val_accuracy: 0.6793 - val_loss: 0.7495 - learning_rate: 1.0000e-06 - duration: 2.0666 Epoch 22/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7477 - loss: 0.5855 - val_accuracy: 0.6790 - val_loss: 0.7494 - learning_rate: 1.0000e-06 - duration: 1.9945 Epoch 23/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7490 - loss: 0.5842 - val_accuracy: 0.6791 - val_loss: 0.7494 - learning_rate: 1.0000e-06 - duration: 1.9913 Epoch 24/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7525 - loss: 0.5838 - val_accuracy: 0.6793 - val_loss: 0.7496 - learning_rate: 1.0000e-06 - duration: 2.0077 Epoch 25/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7523 - loss: 0.5808 - val_accuracy: 0.6791 - val_loss: 0.7499 - learning_rate: 1.0000e-06 - duration: 1.9774 Epoch 26/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7482 - loss: 0.5809 - val_accuracy: 0.6791 - val_loss: 0.7498 - learning_rate: 1.0000e-06 - duration: 1.9981 Epoch 27/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7549 - loss: 0.5795 - val_accuracy: 0.6788 - val_loss: 0.7500 - learning_rate: 1.0000e-06 - duration: 1.9941 Epoch 28/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7538 - loss: 0.5808 - val_accuracy: 0.6790 - val_loss: 0.7500 - learning_rate: 1.0000e-06 - duration: 1.9836 Epoch 29/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7512 - loss: 0.5830 - val_accuracy: 0.6790 - val_loss: 0.7501 - learning_rate: 1.0000e-06 - duration: 2.0738 Epoch 30/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7543 - loss: 0.5836 - val_accuracy: 0.6783 - val_loss: 0.7502 - learning_rate: 1.0000e-06 - duration: 1.9872
plot_convergence(history=history_s)
history_sa = run_cnn('small', sizes=sizes, dropout=dropout, callbacks=callbacks, augmentation_layers=image_aug_layers)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/30
2024-04-24 23:17:53.456865: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/sequential_1_1/Dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
459/459 ━━━━━━━━━━━━━━━━━━━━ 11s 14ms/step - accuracy: 0.4121 - loss: 1.0602 - val_accuracy: 0.5096 - val_loss: 0.9689 - learning_rate: 0.0010 - duration: 11.1980 Epoch 2/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.5418 - loss: 0.9471 - val_accuracy: 0.5693 - val_loss: 0.9000 - learning_rate: 0.0010 - duration: 3.5302 Epoch 3/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.5675 - loss: 0.9083 - val_accuracy: 0.5816 - val_loss: 0.8770 - learning_rate: 0.0010 - duration: 3.5331 Epoch 4/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.5790 - loss: 0.8917 - val_accuracy: 0.5948 - val_loss: 0.8532 - learning_rate: 0.0010 - duration: 3.5916 Epoch 5/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.5859 - loss: 0.8672 - val_accuracy: 0.5927 - val_loss: 0.8461 - learning_rate: 0.0010 - duration: 3.5321 Epoch 6/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.5985 - loss: 0.8526 - val_accuracy: 0.6207 - val_loss: 0.8179 - learning_rate: 0.0010 - duration: 3.5796 Epoch 7/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6060 - loss: 0.8331 - val_accuracy: 0.6156 - val_loss: 0.8414 - learning_rate: 0.0010 - duration: 3.5560 Epoch 8/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6298 - loss: 0.7967 - val_accuracy: 0.6503 - val_loss: 0.7656 - learning_rate: 5.0000e-04 - duration: 3.5678 Epoch 9/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6403 - loss: 0.7802 - val_accuracy: 0.6451 - val_loss: 0.7793 - learning_rate: 5.0000e-04 - duration: 3.6597 Epoch 10/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6480 - loss: 0.7615 - val_accuracy: 0.6624 - val_loss: 0.7373 - learning_rate: 2.5000e-04 - duration: 3.5612 Epoch 11/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6555 - loss: 0.7481 - val_accuracy: 0.6616 - val_loss: 0.7372 - learning_rate: 2.5000e-04 - duration: 3.5003 Epoch 12/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.6692 - loss: 0.7311 - val_accuracy: 0.6613 - val_loss: 0.7336 - learning_rate: 1.2500e-04 - duration: 3.4744 Epoch 13/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6664 - loss: 0.7291 - val_accuracy: 0.6697 - val_loss: 0.7225 - learning_rate: 1.2500e-04 - duration: 3.5611 Epoch 14/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6673 - loss: 0.7255 - val_accuracy: 0.6747 - val_loss: 0.7179 - learning_rate: 1.2500e-04 - duration: 3.6061 Epoch 15/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6691 - loss: 0.7252 - val_accuracy: 0.6748 - val_loss: 0.7159 - learning_rate: 1.2500e-04 - duration: 3.5549 Epoch 16/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6746 - loss: 0.7123 - val_accuracy: 0.6728 - val_loss: 0.7143 - learning_rate: 1.2500e-04 - duration: 3.5368 Epoch 17/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6686 - loss: 0.7162 - val_accuracy: 0.6745 - val_loss: 0.7135 - learning_rate: 1.2500e-04 - duration: 3.5227 Epoch 18/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6717 - loss: 0.7103 - val_accuracy: 0.6777 - val_loss: 0.7049 - learning_rate: 1.2500e-04 - duration: 3.7102 Epoch 19/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6801 - loss: 0.7038 - val_accuracy: 0.6775 - val_loss: 0.7042 - learning_rate: 1.2500e-04 - duration: 3.5439 Epoch 20/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6864 - loss: 0.7013 - val_accuracy: 0.6809 - val_loss: 0.7046 - learning_rate: 1.2500e-04 - duration: 3.5292 Epoch 21/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6844 - loss: 0.6936 - val_accuracy: 0.6810 - val_loss: 0.6968 - learning_rate: 6.2500e-05 - duration: 3.5635 Epoch 22/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6887 - loss: 0.6940 - val_accuracy: 0.6836 - val_loss: 0.6990 - learning_rate: 6.2500e-05 - duration: 3.6448 Epoch 23/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6942 - loss: 0.6894 - val_accuracy: 0.6860 - val_loss: 0.6941 - learning_rate: 3.1250e-05 - duration: 3.6526 Epoch 24/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6920 - loss: 0.6892 - val_accuracy: 0.6836 - val_loss: 0.6951 - learning_rate: 3.1250e-05 - duration: 3.6231 Epoch 25/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6929 - loss: 0.6808 - val_accuracy: 0.6865 - val_loss: 0.6917 - learning_rate: 1.5625e-05 - duration: 3.5664 Epoch 26/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6869 - loss: 0.6853 - val_accuracy: 0.6869 - val_loss: 0.6927 - learning_rate: 1.5625e-05 - duration: 3.7454 Epoch 27/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6899 - loss: 0.6857 - val_accuracy: 0.6880 - val_loss: 0.6910 - learning_rate: 7.8125e-06 - duration: 3.5356 Epoch 28/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6902 - loss: 0.6866 - val_accuracy: 0.6884 - val_loss: 0.6905 - learning_rate: 7.8125e-06 - duration: 3.5494 Epoch 29/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6888 - loss: 0.6886 - val_accuracy: 0.6903 - val_loss: 0.6909 - learning_rate: 7.8125e-06 - duration: 3.5884 Epoch 30/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6923 - loss: 0.6855 - val_accuracy: 0.6887 - val_loss: 0.6906 - learning_rate: 3.9063e-06 - duration: 3.5858
plot_convergence(history=history_sa)
history_m = run_cnn('medium', sizes=sizes, dropout=dropout, callbacks=callbacks)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/30
2024-04-24 23:20:06.940604: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154587: 2.91214, expected 2.33608
2024-04-24 23:20:06.940669: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154589: 4.58643, expected 4.01038
2024-04-24 23:20:06.940679: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154590: 3.79635, expected 3.2203
2024-04-24 23:20:06.940686: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154591: 4.70247, expected 4.12642
2024-04-24 23:20:06.940694: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154592: 4.03855, expected 3.46249
2024-04-24 23:20:06.940702: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154594: 4.19126, expected 3.61521
2024-04-24 23:20:06.940709: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154595: 4.6629, expected 4.08684
2024-04-24 23:20:06.940717: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154596: 4.6503, expected 4.07425
2024-04-24 23:20:06.940724: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154597: 4.66434, expected 4.08829
2024-04-24 23:20:06.940732: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154599: 4.14528, expected 3.56922
2024-04-24 23:20:06.952656: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[32,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:06.952709: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:06.952718: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:06.952725: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:06.952733: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:06.952750: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:20:07.325494: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154587: 2.91214, expected 2.33608
2024-04-24 23:20:07.325551: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154589: 4.58643, expected 4.01038
2024-04-24 23:20:07.325560: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154590: 3.79635, expected 3.2203
2024-04-24 23:20:07.325568: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154591: 4.70247, expected 4.12642
2024-04-24 23:20:07.325576: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154592: 4.03855, expected 3.46249
2024-04-24 23:20:07.325584: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154594: 4.19126, expected 3.61521
2024-04-24 23:20:07.325591: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154595: 4.6629, expected 4.08684
2024-04-24 23:20:07.325599: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154596: 4.6503, expected 4.07425
2024-04-24 23:20:07.325607: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154597: 4.66434, expected 4.08829
2024-04-24 23:20:07.325614: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154599: 4.14528, expected 3.56922
2024-04-24 23:20:07.337192: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[32,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:07.337235: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:07.337243: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:07.337250: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:07.337257: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:07.337273: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
456/459 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.4578 - loss: 1.0121
2024-04-24 23:20:33.666921: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51529: 2.68601, expected 2.09069
2024-04-24 23:20:33.666980: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51530: 4.50539, expected 3.91007
2024-04-24 23:20:33.666998: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51531: 3.61242, expected 3.0171
2024-04-24 23:20:33.667010: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51532: 3.91043, expected 3.31511
2024-04-24 23:20:33.667021: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51533: 4.52412, expected 3.9288
2024-04-24 23:20:33.667034: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51534: 4.1609, expected 3.56558
2024-04-24 23:20:33.667046: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51535: 4.43227, expected 3.83695
2024-04-24 23:20:33.667057: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51536: 4.07472, expected 3.4794
2024-04-24 23:20:33.667069: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51538: 4.02868, expected 3.43336
2024-04-24 23:20:33.667099: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51539: 4.76487, expected 4.16955
2024-04-24 23:20:33.668998: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[7,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[7,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:33.669035: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:33.669049: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:33.669059: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:33.669068: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:33.669084: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:20:33.747050: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51529: 2.68601, expected 2.09069
2024-04-24 23:20:33.747115: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51530: 4.50539, expected 3.91007
2024-04-24 23:20:33.747133: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51531: 3.61242, expected 3.0171
2024-04-24 23:20:33.747156: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51532: 3.91043, expected 3.31511
2024-04-24 23:20:33.747183: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51533: 4.52412, expected 3.9288
2024-04-24 23:20:33.747194: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51534: 4.1609, expected 3.56558
2024-04-24 23:20:33.747204: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51535: 4.43227, expected 3.83695
2024-04-24 23:20:33.747217: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51536: 4.07472, expected 3.4794
2024-04-24 23:20:33.747230: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51538: 4.02868, expected 3.43336
2024-04-24 23:20:33.747243: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51539: 4.76487, expected 4.16955
2024-04-24 23:20:33.747274: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[7,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[7,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:33.747301: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:33.747327: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:33.747353: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:33.747368: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:33.747389: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
459/459 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.4582 - loss: 1.0118
2024-04-24 23:20:46.598053: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51569: 3.23824, expected 2.79087
2024-04-24 23:20:46.598119: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51755: 2.81694, expected 2.36957
2024-04-24 23:20:46.598258: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 80357: 3.31329, expected 2.86592
2024-04-24 23:20:46.598353: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 102831: 3.24935, expected 2.80198
2024-04-24 23:20:46.598725: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154587: 3.18783, expected 2.65505
2024-04-24 23:20:46.598758: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154588: 4.0353, expected 3.50252
2024-04-24 23:20:46.598773: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154591: 3.77932, expected 3.24654
2024-04-24 23:20:46.598786: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154600: 4.0984, expected 3.56562
2024-04-24 23:20:46.598798: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154601: 4.04705, expected 3.51427
2024-04-24 23:20:46.598811: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154612: 3.68866, expected 3.15588
2024-04-24 23:20:46.603015: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[11,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:46.603046: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:46.603054: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:46.603061: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:46.603068: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:46.603083: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
2024-04-24 23:20:46.728011: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51569: 3.23824, expected 2.79087
2024-04-24 23:20:46.728070: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 51755: 2.81694, expected 2.36957
2024-04-24 23:20:46.728255: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 80357: 3.31329, expected 2.86592
2024-04-24 23:20:46.728410: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 102831: 3.24935, expected 2.80198
2024-04-24 23:20:46.728781: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154587: 3.18783, expected 2.65505
2024-04-24 23:20:46.728805: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154588: 4.0353, expected 3.50252
2024-04-24 23:20:46.728813: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154591: 3.77932, expected 3.24654
2024-04-24 23:20:46.728821: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154600: 4.0984, expected 3.56562
2024-04-24 23:20:46.728829: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154601: 4.04705, expected 3.51427
2024-04-24 23:20:46.728836: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 154612: 3.68866, expected 3.15588
2024-04-24 23:20:46.732982: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[11,16,227,227]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,3,227,227]{3,2,1,0}, f32[16,3,3,3]{3,2,1,0}, f32[16]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-04-24 23:20:46.733012: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-04-24 23:20:46.733021: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-04-24 23:20:46.733028: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12020 (535.129.3)
2024-04-24 23:20:46.733036: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-04-24 23:20:46.733051: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0
459/459 ━━━━━━━━━━━━━━━━━━━━ 44s 79ms/step - accuracy: 0.4584 - loss: 1.0117 - val_accuracy: 0.5865 - val_loss: 0.8798 - learning_rate: 0.0010 - duration: 43.9605 Epoch 2/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.5727 - loss: 0.8899 - val_accuracy: 0.6010 - val_loss: 0.8803 - learning_rate: 0.0010 - duration: 9.1942 Epoch 3/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 21ms/step - accuracy: 0.5973 - loss: 0.8505 - val_accuracy: 0.6110 - val_loss: 0.8353 - learning_rate: 5.0000e-04 - duration: 9.6535 Epoch 4/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 21ms/step - accuracy: 0.6095 - loss: 0.8381 - val_accuracy: 0.6218 - val_loss: 0.8344 - learning_rate: 5.0000e-04 - duration: 9.8203 Epoch 5/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 21ms/step - accuracy: 0.6194 - loss: 0.8183 - val_accuracy: 0.6257 - val_loss: 0.8327 - learning_rate: 5.0000e-04 - duration: 9.7431 Epoch 6/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6261 - loss: 0.8063 - val_accuracy: 0.6263 - val_loss: 0.8148 - learning_rate: 5.0000e-04 - duration: 9.2613 Epoch 7/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6435 - loss: 0.7882 - val_accuracy: 0.6263 - val_loss: 0.8253 - learning_rate: 5.0000e-04 - duration: 9.1843 Epoch 8/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 20ms/step - accuracy: 0.6504 - loss: 0.7638 - val_accuracy: 0.6379 - val_loss: 0.7968 - learning_rate: 2.5000e-04 - duration: 10.2215 Epoch 9/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6524 - loss: 0.7506 - val_accuracy: 0.6417 - val_loss: 0.7940 - learning_rate: 2.5000e-04 - duration: 9.1427 Epoch 10/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6656 - loss: 0.7384 - val_accuracy: 0.6443 - val_loss: 0.8009 - learning_rate: 2.5000e-04 - duration: 9.1684 Epoch 11/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6700 - loss: 0.7206 - val_accuracy: 0.6433 - val_loss: 0.7907 - learning_rate: 1.2500e-04 - duration: 9.1523 Epoch 12/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6794 - loss: 0.7115 - val_accuracy: 0.6448 - val_loss: 0.7865 - learning_rate: 1.2500e-04 - duration: 9.1596 Epoch 13/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 20ms/step - accuracy: 0.6793 - loss: 0.7074 - val_accuracy: 0.6446 - val_loss: 0.7849 - learning_rate: 1.2500e-04 - duration: 10.2146 Epoch 14/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6781 - loss: 0.6963 - val_accuracy: 0.6481 - val_loss: 0.7862 - learning_rate: 1.2500e-04 - duration: 9.1379 Epoch 15/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6888 - loss: 0.6861 - val_accuracy: 0.6473 - val_loss: 0.7838 - learning_rate: 6.2500e-05 - duration: 9.1637 Epoch 16/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6923 - loss: 0.6812 - val_accuracy: 0.6475 - val_loss: 0.7846 - learning_rate: 6.2500e-05 - duration: 9.1452 Epoch 17/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 10s 20ms/step - accuracy: 0.6928 - loss: 0.6789 - val_accuracy: 0.6498 - val_loss: 0.7802 - learning_rate: 3.1250e-05 - duration: 10.2268 Epoch 18/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7041 - loss: 0.6664 - val_accuracy: 0.6483 - val_loss: 0.7805 - learning_rate: 3.1250e-05 - duration: 9.2024 Epoch 19/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7041 - loss: 0.6660 - val_accuracy: 0.6494 - val_loss: 0.7790 - learning_rate: 1.5625e-05 - duration: 9.1964 Epoch 20/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7081 - loss: 0.6648 - val_accuracy: 0.6489 - val_loss: 0.7794 - learning_rate: 1.5625e-05 - duration: 9.2082 Epoch 21/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6957 - loss: 0.6677 - val_accuracy: 0.6500 - val_loss: 0.7786 - learning_rate: 7.8125e-06 - duration: 9.2027 Epoch 22/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.6998 - loss: 0.6629 - val_accuracy: 0.6491 - val_loss: 0.7785 - learning_rate: 7.8125e-06 - duration: 9.1241 Epoch 23/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7008 - loss: 0.6586 - val_accuracy: 0.6500 - val_loss: 0.7785 - learning_rate: 3.9063e-06 - duration: 9.1428 Epoch 24/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7032 - loss: 0.6578 - val_accuracy: 0.6494 - val_loss: 0.7777 - learning_rate: 1.9531e-06 - duration: 9.1529 Epoch 25/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7039 - loss: 0.6588 - val_accuracy: 0.6487 - val_loss: 0.7779 - learning_rate: 1.9531e-06 - duration: 9.3384 Epoch 26/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7034 - loss: 0.6618 - val_accuracy: 0.6487 - val_loss: 0.7775 - learning_rate: 1.0000e-06 - duration: 9.3339 Epoch 27/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7016 - loss: 0.6609 - val_accuracy: 0.6491 - val_loss: 0.7774 - learning_rate: 1.0000e-06 - duration: 9.2386 Epoch 28/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7030 - loss: 0.6601 - val_accuracy: 0.6491 - val_loss: 0.7774 - learning_rate: 1.0000e-06 - duration: 9.2533 Epoch 29/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7101 - loss: 0.6575 - val_accuracy: 0.6487 - val_loss: 0.7775 - learning_rate: 1.0000e-06 - duration: 9.1664 Epoch 30/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 9s 20ms/step - accuracy: 0.7010 - loss: 0.6589 - val_accuracy: 0.6487 - val_loss: 0.7774 - learning_rate: 1.0000e-06 - duration: 9.1563
plot_convergence(history=history_m)
history_ma = run_cnn('medium', sizes=sizes, dropout=dropout, callbacks=callbacks, augmentation_layers=image_aug_layers)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/30
2024-04-24 23:25:29.863784: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/sequential_3_1/Dropout_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
459/459 ━━━━━━━━━━━━━━━━━━━━ 21s 38ms/step - accuracy: 0.4225 - loss: 1.0442 - val_accuracy: 0.5499 - val_loss: 0.9368 - learning_rate: 0.0010 - duration: 20.9867 Epoch 2/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.5540 - loss: 0.9319 - val_accuracy: 0.5577 - val_loss: 0.9140 - learning_rate: 0.0010 - duration: 14.2243 Epoch 3/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.5687 - loss: 0.9082 - val_accuracy: 0.5790 - val_loss: 0.8711 - learning_rate: 0.0010 - duration: 14.1725 Epoch 4/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.5895 - loss: 0.8796 - val_accuracy: 0.6061 - val_loss: 0.8360 - learning_rate: 0.0010 - duration: 14.2023 Epoch 5/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.5945 - loss: 0.8631 - val_accuracy: 0.6230 - val_loss: 0.8111 - learning_rate: 0.0010 - duration: 14.1629 Epoch 6/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 20s 31ms/step - accuracy: 0.6022 - loss: 0.8446 - val_accuracy: 0.6190 - val_loss: 0.8175 - learning_rate: 0.0010 - duration: 20.4890 Epoch 7/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6295 - loss: 0.7986 - val_accuracy: 0.6387 - val_loss: 0.7831 - learning_rate: 5.0000e-04 - duration: 14.1430 Epoch 8/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6395 - loss: 0.7842 - val_accuracy: 0.6527 - val_loss: 0.7533 - learning_rate: 5.0000e-04 - duration: 14.1942 Epoch 9/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6536 - loss: 0.7650 - val_accuracy: 0.6494 - val_loss: 0.7640 - learning_rate: 5.0000e-04 - duration: 14.2010 Epoch 10/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6536 - loss: 0.7477 - val_accuracy: 0.6608 - val_loss: 0.7278 - learning_rate: 2.5000e-04 - duration: 14.1983 Epoch 11/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6674 - loss: 0.7341 - val_accuracy: 0.6713 - val_loss: 0.7141 - learning_rate: 2.5000e-04 - duration: 14.1929 Epoch 12/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6703 - loss: 0.7174 - val_accuracy: 0.6758 - val_loss: 0.7108 - learning_rate: 2.5000e-04 - duration: 14.1812 Epoch 13/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6698 - loss: 0.7133 - val_accuracy: 0.6836 - val_loss: 0.6983 - learning_rate: 2.5000e-04 - duration: 14.2263 Epoch 14/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 20s 31ms/step - accuracy: 0.6799 - loss: 0.7049 - val_accuracy: 0.6904 - val_loss: 0.6913 - learning_rate: 2.5000e-04 - duration: 20.4639 Epoch 15/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 20s 31ms/step - accuracy: 0.6826 - loss: 0.7009 - val_accuracy: 0.6895 - val_loss: 0.6888 - learning_rate: 2.5000e-04 - duration: 20.3917 Epoch 16/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6893 - loss: 0.6861 - val_accuracy: 0.6989 - val_loss: 0.6750 - learning_rate: 2.5000e-04 - duration: 14.1831 Epoch 17/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6952 - loss: 0.6815 - val_accuracy: 0.7008 - val_loss: 0.6737 - learning_rate: 2.5000e-04 - duration: 14.2185 Epoch 18/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6974 - loss: 0.6804 - val_accuracy: 0.7014 - val_loss: 0.6672 - learning_rate: 2.5000e-04 - duration: 14.1582 Epoch 19/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.6982 - loss: 0.6653 - val_accuracy: 0.7064 - val_loss: 0.6630 - learning_rate: 2.5000e-04 - duration: 14.1752 Epoch 20/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7051 - loss: 0.6654 - val_accuracy: 0.7113 - val_loss: 0.6489 - learning_rate: 2.5000e-04 - duration: 14.1557 Epoch 21/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7089 - loss: 0.6570 - val_accuracy: 0.7073 - val_loss: 0.6672 - learning_rate: 2.5000e-04 - duration: 14.1679 Epoch 22/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7062 - loss: 0.6602 - val_accuracy: 0.7140 - val_loss: 0.6434 - learning_rate: 1.2500e-04 - duration: 14.1312 Epoch 23/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7125 - loss: 0.6460 - val_accuracy: 0.7223 - val_loss: 0.6315 - learning_rate: 1.2500e-04 - duration: 14.1623 Epoch 24/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 20s 31ms/step - accuracy: 0.7131 - loss: 0.6437 - val_accuracy: 0.7191 - val_loss: 0.6348 - learning_rate: 1.2500e-04 - duration: 20.4598 Epoch 25/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7194 - loss: 0.6355 - val_accuracy: 0.7218 - val_loss: 0.6280 - learning_rate: 6.2500e-05 - duration: 14.2099 Epoch 26/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7229 - loss: 0.6284 - val_accuracy: 0.7227 - val_loss: 0.6268 - learning_rate: 6.2500e-05 - duration: 14.1481 Epoch 27/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 21s 31ms/step - accuracy: 0.7178 - loss: 0.6338 - val_accuracy: 0.7223 - val_loss: 0.6250 - learning_rate: 6.2500e-05 - duration: 20.5192 Epoch 28/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7158 - loss: 0.6270 - val_accuracy: 0.7248 - val_loss: 0.6247 - learning_rate: 6.2500e-05 - duration: 14.1742 Epoch 29/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7278 - loss: 0.6264 - val_accuracy: 0.7242 - val_loss: 0.6209 - learning_rate: 6.2500e-05 - duration: 14.1787 Epoch 30/30 459/459 ━━━━━━━━━━━━━━━━━━━━ 14s 31ms/step - accuracy: 0.7175 - loss: 0.6309 - val_accuracy: 0.7223 - val_loss: 0.6213 - learning_rate: 6.2500e-05 - duration: 14.1721
plot_convergence(history=history_ma)
def run_xception(shape,
sizes,
dropout,
augmentation_layers=None,
optimizer=None,
metrics=['accuracy'],
loss=None,
epochs=EPOCHS,
callbacks=None,
return_model=False
):
"""
Build and run a Xception model
"""
train, val = load_ds(shape, augmentation_layers=augmentation_layers)
model = xception_model(shape,
sizes,
dropout=dropout,
rescaling=True,
)
if optimizer is None:
optimizer = Adam(0.001)
if loss is None:
loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss,
metrics=metrics)
history = model.fit(train,
validation_data=val,
epochs=epochs,
callbacks=callbacks
)
if return_model:
return history.history, model
return history.history
image_aug_layers = [
layers.RandomRotation(0.2),
layers.RandomZoom(0.3)
]
sizes = [128, 256, 512, 728, 1024]
dropout = 0.25
reduce_lr = ReduceLROnPlateau(patience=1, factor=0.5, min_lr=1e-6)
timestp = TimestampCallback()
early = EarlyStopping(patience=10, restore_best_weights=False, verbose=1)
metrics = ['accuracy']
callbacks = [reduce_lr, timestp]
history_s_x = run_xception('small', sizes, dropout, callbacks=callbacks, epochs=15)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 46s 60ms/step - accuracy: 0.5302 - loss: 0.9612 - val_accuracy: 0.3357 - val_loss: 1.7060 - learning_rate: 0.0010 - duration: 45.7247 Epoch 2/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.6407 - loss: 0.7882 - val_accuracy: 0.4527 - val_loss: 1.6406 - learning_rate: 0.0010 - duration: 14.7994 Epoch 3/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.6787 - loss: 0.7210 - val_accuracy: 0.5453 - val_loss: 1.4694 - learning_rate: 0.0010 - duration: 14.7674 Epoch 4/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.7029 - loss: 0.6746 - val_accuracy: 0.6026 - val_loss: 1.0970 - learning_rate: 0.0010 - duration: 14.7735 Epoch 5/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.7278 - loss: 0.6243 - val_accuracy: 0.4869 - val_loss: 1.6485 - learning_rate: 0.0010 - duration: 14.6799 Epoch 6/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.7605 - loss: 0.5588 - val_accuracy: 0.7073 - val_loss: 0.7285 - learning_rate: 5.0000e-04 - duration: 14.7122 Epoch 7/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.7985 - loss: 0.4871 - val_accuracy: 0.7114 - val_loss: 0.8581 - learning_rate: 5.0000e-04 - duration: 14.7108 Epoch 8/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.8490 - loss: 0.3807 - val_accuracy: 0.6871 - val_loss: 0.9186 - learning_rate: 2.5000e-04 - duration: 14.7464 Epoch 9/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9055 - loss: 0.2422 - val_accuracy: 0.6974 - val_loss: 1.1072 - learning_rate: 1.2500e-04 - duration: 14.6594 Epoch 10/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9610 - loss: 0.1180 - val_accuracy: 0.7143 - val_loss: 1.1015 - learning_rate: 6.2500e-05 - duration: 14.6729 Epoch 11/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9909 - loss: 0.0502 - val_accuracy: 0.7124 - val_loss: 1.1465 - learning_rate: 3.1250e-05 - duration: 14.7515 Epoch 12/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9980 - loss: 0.0286 - val_accuracy: 0.7094 - val_loss: 1.1868 - learning_rate: 1.5625e-05 - duration: 14.7119 Epoch 13/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9994 - loss: 0.0211 - val_accuracy: 0.7083 - val_loss: 1.2115 - learning_rate: 7.8125e-06 - duration: 14.7866 Epoch 14/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9994 - loss: 0.0184 - val_accuracy: 0.7079 - val_loss: 1.2258 - learning_rate: 3.9063e-06 - duration: 14.6602 Epoch 15/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.9994 - loss: 0.0163 - val_accuracy: 0.7071 - val_loss: 1.2352 - learning_rate: 1.9531e-06 - duration: 14.7323
plot_convergence(history=history_s_x)
history_sa_x = run_xception('small', sizes, dropout, callbacks=callbacks, epochs=15, augmentation_layers=image_aug_layers)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 38s 54ms/step - accuracy: 0.4996 - loss: 1.0060 - val_accuracy: 0.5803 - val_loss: 0.9845 - learning_rate: 0.0010 - duration: 37.9173 Epoch 2/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.6110 - loss: 0.8246 - val_accuracy: 0.6094 - val_loss: 0.9299 - learning_rate: 0.0010 - duration: 15.0981 Epoch 3/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.6574 - loss: 0.7565 - val_accuracy: 0.6075 - val_loss: 1.0803 - learning_rate: 0.0010 - duration: 15.0955 Epoch 4/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.6926 - loss: 0.6923 - val_accuracy: 0.6736 - val_loss: 0.7361 - learning_rate: 5.0000e-04 - duration: 15.0975 Epoch 5/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7104 - loss: 0.6584 - val_accuracy: 0.6906 - val_loss: 0.6770 - learning_rate: 5.0000e-04 - duration: 15.0801 Epoch 6/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7165 - loss: 0.6451 - val_accuracy: 0.7223 - val_loss: 0.6491 - learning_rate: 5.0000e-04 - duration: 15.0748 Epoch 7/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7272 - loss: 0.6336 - val_accuracy: 0.7239 - val_loss: 0.6414 - learning_rate: 5.0000e-04 - duration: 15.1659 Epoch 8/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7317 - loss: 0.6238 - val_accuracy: 0.6919 - val_loss: 0.7155 - learning_rate: 5.0000e-04 - duration: 15.0455 Epoch 9/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7425 - loss: 0.5980 - val_accuracy: 0.7331 - val_loss: 0.6107 - learning_rate: 2.5000e-04 - duration: 15.0835 Epoch 10/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7440 - loss: 0.5817 - val_accuracy: 0.7199 - val_loss: 0.6404 - learning_rate: 2.5000e-04 - duration: 15.0848 Epoch 11/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7491 - loss: 0.5680 - val_accuracy: 0.7382 - val_loss: 0.5969 - learning_rate: 1.2500e-04 - duration: 15.0198 Epoch 12/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7583 - loss: 0.5544 - val_accuracy: 0.7348 - val_loss: 0.5954 - learning_rate: 1.2500e-04 - duration: 15.1072 Epoch 13/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 32ms/step - accuracy: 0.7553 - loss: 0.5542 - val_accuracy: 0.7487 - val_loss: 0.5831 - learning_rate: 1.2500e-04 - duration: 14.9767 Epoch 14/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7620 - loss: 0.5421 - val_accuracy: 0.7350 - val_loss: 0.6118 - learning_rate: 1.2500e-04 - duration: 15.2108 Epoch 15/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 15s 33ms/step - accuracy: 0.7634 - loss: 0.5376 - val_accuracy: 0.7425 - val_loss: 0.5795 - learning_rate: 6.2500e-05 - duration: 15.0119
plot_convergence(history=history_sa_x)
history_m_x = run_xception('medium', sizes, dropout, callbacks=callbacks, epochs=15)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 178s 293ms/step - accuracy: 0.4927 - loss: 1.0053 - val_accuracy: 0.3584 - val_loss: 1.2137 - learning_rate: 0.0010 - duration: 178.4982 Epoch 2/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.5774 - loss: 0.8854 - val_accuracy: 0.5031 - val_loss: 1.2167 - learning_rate: 0.0010 - duration: 114.2969 Epoch 3/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.6452 - loss: 0.7931 - val_accuracy: 0.5808 - val_loss: 1.0220 - learning_rate: 5.0000e-04 - duration: 114.2027 Epoch 4/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.6814 - loss: 0.7296 - val_accuracy: 0.6914 - val_loss: 0.7279 - learning_rate: 5.0000e-04 - duration: 114.2328 Epoch 5/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.7021 - loss: 0.6775 - val_accuracy: 0.6621 - val_loss: 0.7734 - learning_rate: 5.0000e-04 - duration: 114.3001 Epoch 6/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.7330 - loss: 0.6199 - val_accuracy: 0.7197 - val_loss: 0.6457 - learning_rate: 2.5000e-04 - duration: 114.3581 Epoch 7/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 123s 269ms/step - accuracy: 0.7483 - loss: 0.5913 - val_accuracy: 0.7183 - val_loss: 0.6489 - learning_rate: 2.5000e-04 - duration: 123.4555 Epoch 8/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.7606 - loss: 0.5512 - val_accuracy: 0.7062 - val_loss: 0.6764 - learning_rate: 1.2500e-04 - duration: 114.3073 Epoch 9/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.7832 - loss: 0.5095 - val_accuracy: 0.7183 - val_loss: 0.6547 - learning_rate: 6.2500e-05 - duration: 114.2931 Epoch 10/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.7980 - loss: 0.4771 - val_accuracy: 0.7355 - val_loss: 0.6210 - learning_rate: 3.1250e-05 - duration: 114.3039 Epoch 11/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.8125 - loss: 0.4544 - val_accuracy: 0.7336 - val_loss: 0.6345 - learning_rate: 3.1250e-05 - duration: 114.2473 Epoch 12/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.8199 - loss: 0.4327 - val_accuracy: 0.7382 - val_loss: 0.6359 - learning_rate: 1.5625e-05 - duration: 114.2918 Epoch 13/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.8243 - loss: 0.4197 - val_accuracy: 0.7367 - val_loss: 0.6392 - learning_rate: 7.8125e-06 - duration: 114.2605 Epoch 14/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.8311 - loss: 0.4128 - val_accuracy: 0.7353 - val_loss: 0.6403 - learning_rate: 3.9063e-06 - duration: 114.2684 Epoch 15/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 114s 249ms/step - accuracy: 0.8319 - loss: 0.4090 - val_accuracy: 0.7367 - val_loss: 0.6407 - learning_rate: 1.9531e-06 - duration: 114.2778
plot_convergence(history=history_m_x)
history_ma_x = run_xception('medium', sizes, dropout, callbacks=callbacks, epochs=15, augmentation_layers=image_aug_layers)
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation. Epoch 1/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 138s 271ms/step - accuracy: 0.4716 - loss: 1.0387 - val_accuracy: 0.3408 - val_loss: 1.3152 - learning_rate: 0.0010 - duration: 138.3955 Epoch 2/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.5391 - loss: 0.9308 - val_accuracy: 0.4350 - val_loss: 1.5449 - learning_rate: 0.0010 - duration: 115.2725 Epoch 3/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.5968 - loss: 0.8600 - val_accuracy: 0.5534 - val_loss: 1.0207 - learning_rate: 5.0000e-04 - duration: 115.1741 Epoch 4/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 116s 251ms/step - accuracy: 0.6288 - loss: 0.8148 - val_accuracy: 0.6436 - val_loss: 0.7898 - learning_rate: 5.0000e-04 - duration: 115.6446 Epoch 5/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 116s 251ms/step - accuracy: 0.6475 - loss: 0.7724 - val_accuracy: 0.5841 - val_loss: 0.8325 - learning_rate: 5.0000e-04 - duration: 115.5009 Epoch 6/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.6898 - loss: 0.7028 - val_accuracy: 0.6978 - val_loss: 0.6796 - learning_rate: 2.5000e-04 - duration: 115.3124 Epoch 7/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 142s 250ms/step - accuracy: 0.7079 - loss: 0.6679 - val_accuracy: 0.6837 - val_loss: 0.7202 - learning_rate: 2.5000e-04 - duration: 141.9756 Epoch 8/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7278 - loss: 0.6320 - val_accuracy: 0.7404 - val_loss: 0.6079 - learning_rate: 1.2500e-04 - duration: 115.3455 Epoch 9/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7338 - loss: 0.6148 - val_accuracy: 0.7186 - val_loss: 0.6384 - learning_rate: 1.2500e-04 - duration: 115.4247 Epoch 10/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7393 - loss: 0.5986 - val_accuracy: 0.7512 - val_loss: 0.5737 - learning_rate: 6.2500e-05 - duration: 115.3204 Epoch 11/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 142s 250ms/step - accuracy: 0.7396 - loss: 0.5929 - val_accuracy: 0.7498 - val_loss: 0.5714 - learning_rate: 6.2500e-05 - duration: 141.8744 Epoch 12/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 116s 251ms/step - accuracy: 0.7454 - loss: 0.5769 - val_accuracy: 0.7565 - val_loss: 0.5709 - learning_rate: 6.2500e-05 - duration: 115.5681 Epoch 13/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7466 - loss: 0.5769 - val_accuracy: 0.7530 - val_loss: 0.5657 - learning_rate: 6.2500e-05 - duration: 115.1872 Epoch 14/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7479 - loss: 0.5773 - val_accuracy: 0.7587 - val_loss: 0.5549 - learning_rate: 6.2500e-05 - duration: 115.2360 Epoch 15/15 459/459 ━━━━━━━━━━━━━━━━━━━━ 115s 250ms/step - accuracy: 0.7504 - loss: 0.5661 - val_accuracy: 0.7543 - val_loss: 0.5615 - learning_rate: 6.2500e-05 - duration: 115.3780
plot_convergence(history=history_ma_x)
def get_history_means(history):
if isinstance(history, dict):
hist = history
else:
hist = history.history
res = dict(
epoch_counts = len(hist.get('accuracy')),
max_val_acc = np.round(max(hist.get('val_accuracy')), 4),
min_val_loss = np.round(min(hist.get('val_loss')), 4),
min_lr = np.round(min(hist.get('learning_rate')), 4),
tot_duration = np.sum(hist.get('duration')).astype(int),
)
return res
histories = [history_s,
history_sa,
history_m,
history_ma,
history_s_x,
history_sa_x,
history_m_x,
history_ma_x]
model = ['CNN'] * 4 + ['Xception'] * 4
input_data = ['small', 'small', 'medium', 'medium'] * 2
augmentation = ['False', 'True'] * 4
histories_data = list(map(get_history_means, histories))
epoch_counts = list(map(lambda x: x.get('epoch_counts'), histories_data))
max_val_acc = list(map(lambda x: x.get('max_val_acc'), histories_data))
min_val_loss = list(map(lambda x: x.get('min_val_loss'), histories_data))
tot_duration = list(map(lambda x: x.get('tot_duration'), histories_data))
header = dict(values=['Model type',
'Input data',
'Augmentation',
'Epoch counts',
'Max accuracy (val)',
'Min loss (val)',
'Total duration (s)'
])
cells = dict(values=[model,
input_data,
augmentation,
epoch_counts,
max_val_acc,
min_val_loss,
tot_duration
])
CNN
Before training on all the image shapes with and without data augmentation, I tried several architecture for the CNN. I diceded to keep this one, because it gives the best results.
Once the architecture choosed, I tried small (69x69) and medium (227x227) images with and without data augmentation.
Here are a few considerations about the CNN networks trainings:
Xception
In general this architecture performed better then the simple CNN architecture. The observations we maid about CNN remain valids; data augmentation increase running time & large images take longer for training. We can also see that we have overfitting in the models without data augmentation.
table = go.Figure(data=[go.Table(header=header, cells=cells)])
table.update_layout(paper_bgcolor='rgba(0,0,0,0)',
plot_bgcolor='rgba(0,0,0,0)',
margin=dict(l=0, r=0, t=0, b=0),
height=205
)
table.show()
The model with the best ratio between validation accuracy and duration is the Xception model with small images and data augmentation. I will try to push this model a bit further.
def plot_results(y_pred, y_true, title='Prediction stats', average='binary'):
"""
Plot the results
"""
cm = confusion_matrix(y_pred=y_pred, y_true=y_true, )
ac = accuracy_score(y_pred=y_pred, y_true=y_true)
f1 = f1_score(y_pred=y_pred, y_true=y_true, average=average)
recall = recall_score(y_pred=y_pred, y_true=y_true, average=average)
precision = precision_score(y_pred=y_pred, y_true=y_true, average=average)
fig = make_subplots(
rows=1,
cols=2,
column_widths=[0.75, 0.25]
)
fig.add_trace(
go.Heatmap(
z=cm,
x=CATS,
y=CATS,
showscale=True,
transpose=False,
text=cm,
textfont={"size": 30},
texttemplate="%{text}",
colorscale='viridis',
xgap=8,
ygap=8,
),
row=1,
col=2,
)
fig.add_trace(
go.Bar(
x=['Accuracy', "F1 score", "Recall", "Precision"],
y=[ac, f1, recall, precision],
marker=dict(
color=[ac, f1, recall, precision],
colorscale='Bluered',
cmin=0,
cmax=1,
reversescale=True
),
text=[ac, f1, recall, precision],
texttemplate="%{text:.4f}",
textfont={"size": 30}
),
row=1,
col=1
)
fig.update_layout(
yaxis1_range=[0, 1],
xaxis2_title="Predicted",
yaxis2_title="Expected",
title=title + f', (average = {average})',
template='plotly_white'
)
return fig
def make_predictions(trained_model, test_ds):
predictions = trained_model.predict(test_ds)
pred_soft = tf.nn.softmax(predictions)
y_pred = np.argmax(pred_soft, axis=1)
y_true = np.concatenate([y for x, y in test_ds], axis=0)
return y_pred, y_true
test_ds = keras.utils.image_dataset_from_directory(TEST_PATH.joinpath('small'),
labels='inferred',
image_size=(69, 69), shuffle=False)
test_ds.class_names
Found 8976 files belonging to 3 classes.
['E', 'S', 'SB']
train, val = keras.utils.image_dataset_from_directory(TRAIN_PATH.joinpath('small'),
labels='inferred',
image_size=(69, 69),
shuffle=True,
subset='both',
validation_split=TRAIN_TEST_SPLIT,
seed=42
)
train.class_names
Found 20946 files belonging to 3 classes. Using 14663 files for training. Using 6283 files for validation.
['E', 'S', 'SB']
train = train.cache().prefetch(buffer_size=AUTOTUNE)
val = val.cache().prefetch(buffer_size=AUTOTUNE)
image_aug_layers = [
layers.RandomRotation(0.2),
layers.RandomZoom(0.3)
]
sizes = [128, 256, 512, 728, 1024]
dropout = 0.2
reduce_lr = ReduceLROnPlateau(patience=1, factor=0.5, min_lr=1e-6)
timestp = TimestampCallback()
early = EarlyStopping(patience=4, restore_best_weights=True, verbose=1)
metrics = ['accuracy']
callbacks = [reduce_lr, timestp, early]
model = xception_model('small',
sizes,
dropout = dropout,
rescaling = True,
augmentation_layers = image_aug_layers
)
model.compile(optimizer=Adam(0.001),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=metrics)
history= model.fit(train,
validation_data=val,
callbacks=callbacks,
epochs=20)
Epoch 1/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 38s 62ms/step - accuracy: 0.4935 - loss: 1.0088 - val_accuracy: 0.5598 - val_loss: 0.9911 - learning_rate: 0.0010 - duration: 37.5138 Epoch 2/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.5974 - loss: 0.8377 - val_accuracy: 0.6185 - val_loss: 0.9648 - learning_rate: 0.0010 - duration: 26.5855 Epoch 3/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.6501 - loss: 0.7712 - val_accuracy: 0.5988 - val_loss: 1.0398 - learning_rate: 0.0010 - duration: 26.5009 Epoch 4/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.6915 - loss: 0.6992 - val_accuracy: 0.6721 - val_loss: 0.7377 - learning_rate: 5.0000e-04 - duration: 26.7738 Epoch 5/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7068 - loss: 0.6682 - val_accuracy: 0.7121 - val_loss: 0.6598 - learning_rate: 5.0000e-04 - duration: 26.9327 Epoch 6/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 41s 58ms/step - accuracy: 0.7198 - loss: 0.6440 - val_accuracy: 0.6772 - val_loss: 0.7370 - learning_rate: 5.0000e-04 - duration: 40.7627 Epoch 7/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7265 - loss: 0.6194 - val_accuracy: 0.7301 - val_loss: 0.6219 - learning_rate: 2.5000e-04 - duration: 26.9575 Epoch 8/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7331 - loss: 0.6082 - val_accuracy: 0.7282 - val_loss: 0.6266 - learning_rate: 2.5000e-04 - duration: 27.0895 Epoch 9/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7458 - loss: 0.5930 - val_accuracy: 0.7184 - val_loss: 0.6288 - learning_rate: 1.2500e-04 - duration: 26.7766 Epoch 10/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7537 - loss: 0.5680 - val_accuracy: 0.7240 - val_loss: 0.6134 - learning_rate: 6.2500e-05 - duration: 26.8254 Epoch 11/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7522 - loss: 0.5618 - val_accuracy: 0.7239 - val_loss: 0.6197 - learning_rate: 6.2500e-05 - duration: 26.7779 Epoch 12/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7569 - loss: 0.5561 - val_accuracy: 0.7399 - val_loss: 0.5926 - learning_rate: 3.1250e-05 - duration: 26.7953 Epoch 13/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7537 - loss: 0.5525 - val_accuracy: 0.7340 - val_loss: 0.5990 - learning_rate: 3.1250e-05 - duration: 26.7822 Epoch 14/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7555 - loss: 0.5495 - val_accuracy: 0.7434 - val_loss: 0.5856 - learning_rate: 1.5625e-05 - duration: 26.8164 Epoch 15/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7583 - loss: 0.5496 - val_accuracy: 0.7425 - val_loss: 0.5826 - learning_rate: 1.5625e-05 - duration: 26.8653 Epoch 16/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7604 - loss: 0.5408 - val_accuracy: 0.7436 - val_loss: 0.5848 - learning_rate: 1.5625e-05 - duration: 26.9197 Epoch 17/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7626 - loss: 0.5425 - val_accuracy: 0.7504 - val_loss: 0.5748 - learning_rate: 7.8125e-06 - duration: 26.9003 Epoch 18/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7597 - loss: 0.5440 - val_accuracy: 0.7473 - val_loss: 0.5774 - learning_rate: 7.8125e-06 - duration: 26.7344 Epoch 19/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7604 - loss: 0.5419 - val_accuracy: 0.7504 - val_loss: 0.5742 - learning_rate: 3.9063e-06 - duration: 26.8792 Epoch 20/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7646 - loss: 0.5406 - val_accuracy: 0.7519 - val_loss: 0.5724 - learning_rate: 3.9063e-06 - duration: 26.8689 Restoring model weights from the end of the best epoch: 20.
plot_convergence(history=history, title=f"Final model, dropout = {dropout}")
y_pred, y_true = make_predictions(model, test_ds)
plot_results(y_pred, y_true, title=f'Prediction stats, dropout = {dropout}', average='macro')
281/281 ━━━━━━━━━━━━━━━━━━━━ 4s 13ms/step
dropout = 0.3
reduce_lr = ReduceLROnPlateau(patience=1, factor=0.5, min_lr=1e-6)
timestp = TimestampCallback()
early = EarlyStopping(patience=4, restore_best_weights=True, verbose=1)
metrics = ['accuracy']
callbacks = [reduce_lr, timestp, early]
model_1 = xception_model('small',
sizes,
dropout = dropout,
rescaling = True,
augmentation_layers = image_aug_layers
)
model_1.compile(optimizer=Adam(0.001),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=metrics)
history_1= model_1.fit(train,
validation_data=val,
callbacks=callbacks,
epochs=20)
Epoch 1/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 34s 59ms/step - accuracy: 0.5104 - loss: 0.9970 - val_accuracy: 0.5329 - val_loss: 1.0005 - learning_rate: 0.0010 - duration: 33.9762 Epoch 2/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.6117 - loss: 0.8385 - val_accuracy: 0.5139 - val_loss: 1.4021 - learning_rate: 0.0010 - duration: 26.6046 Epoch 3/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.6596 - loss: 0.7552 - val_accuracy: 0.6836 - val_loss: 0.7100 - learning_rate: 5.0000e-04 - duration: 26.7069 Epoch 4/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.6908 - loss: 0.6990 - val_accuracy: 0.6755 - val_loss: 0.7698 - learning_rate: 5.0000e-04 - duration: 26.6636 Epoch 5/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7095 - loss: 0.6586 - val_accuracy: 0.6828 - val_loss: 0.7159 - learning_rate: 2.5000e-04 - duration: 26.7007 Epoch 6/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7277 - loss: 0.6228 - val_accuracy: 0.6892 - val_loss: 0.6967 - learning_rate: 1.2500e-04 - duration: 26.7075 Epoch 7/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7342 - loss: 0.6038 - val_accuracy: 0.7223 - val_loss: 0.6288 - learning_rate: 1.2500e-04 - duration: 26.6872 Epoch 8/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7388 - loss: 0.5998 - val_accuracy: 0.6951 - val_loss: 0.6886 - learning_rate: 1.2500e-04 - duration: 26.6740 Epoch 9/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7400 - loss: 0.5904 - val_accuracy: 0.6941 - val_loss: 0.6755 - learning_rate: 6.2500e-05 - duration: 26.7107 Epoch 10/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7458 - loss: 0.5784 - val_accuracy: 0.7227 - val_loss: 0.6223 - learning_rate: 3.1250e-05 - duration: 26.6249 Epoch 11/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7502 - loss: 0.5716 - val_accuracy: 0.7318 - val_loss: 0.6071 - learning_rate: 3.1250e-05 - duration: 26.7093 Epoch 12/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7554 - loss: 0.5676 - val_accuracy: 0.7307 - val_loss: 0.6074 - learning_rate: 3.1250e-05 - duration: 26.6794 Epoch 13/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7520 - loss: 0.5586 - val_accuracy: 0.7355 - val_loss: 0.5986 - learning_rate: 1.5625e-05 - duration: 26.7133 Epoch 14/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 41s 58ms/step - accuracy: 0.7558 - loss: 0.5527 - val_accuracy: 0.7350 - val_loss: 0.5970 - learning_rate: 1.5625e-05 - duration: 40.9187 Epoch 15/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7552 - loss: 0.5605 - val_accuracy: 0.7344 - val_loss: 0.5948 - learning_rate: 1.5625e-05 - duration: 26.7387 Epoch 16/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7545 - loss: 0.5620 - val_accuracy: 0.7329 - val_loss: 0.5957 - learning_rate: 1.5625e-05 - duration: 26.7127 Epoch 17/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7618 - loss: 0.5518 - val_accuracy: 0.7422 - val_loss: 0.5825 - learning_rate: 7.8125e-06 - duration: 26.8689 Epoch 18/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7557 - loss: 0.5547 - val_accuracy: 0.7403 - val_loss: 0.5864 - learning_rate: 7.8125e-06 - duration: 26.8242 Epoch 19/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 59ms/step - accuracy: 0.7573 - loss: 0.5516 - val_accuracy: 0.7441 - val_loss: 0.5780 - learning_rate: 3.9063e-06 - duration: 26.8649 Epoch 20/20 459/459 ━━━━━━━━━━━━━━━━━━━━ 27s 58ms/step - accuracy: 0.7542 - loss: 0.5538 - val_accuracy: 0.7445 - val_loss: 0.5779 - learning_rate: 3.9063e-06 - duration: 26.7374 Restoring model weights from the end of the best epoch: 20.
plot_convergence(history=history_1, title=f"Final model, dropout = {dropout}")
y_pred, y_true = make_predictions(model_1, test_ds)
plot_results(y_pred, y_true, title=f'Prediction stats, dropout = {dropout}', average='macro')
281/281 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step
The two models performed well on the test data. The best model was the model with a dropout of $0.3$. If we take a look to the correlation matrix we can make some observations:
E is the best. This make sense because visually the elitical are alredy very different from the spiral galayies.S are as often classified as E then as SB.SB, however the false positives are less frequent.correct = y_pred == y_true
files = list(map(lambda x: x.split("/")[-1].split(".")[0], test_ds.file_paths))
df_pred = pl.DataFrame({'correct': correct, 'asset_id': files})
pred_df = (
df
.with_columns(pl.col('asset_id').cast(pl.Utf8))
.join(df_pred, on='asset_id')
.with_columns(pl.col('correct').cast(pl.UInt8))
)
pred_df.head()
| dr7objid | asset_id | gz2class | total_classifications | total_votes | agreement | target | path_small | path_medium | path_large | correct | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| i64 | i64 | str | str | i64 | i64 | f64 | str | str | str | str | u8 |
| 0 | 587732591714893851 | "58957" | "Sc+t" | 45 | 342 | 1.0 | "S" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… | 1 |
| 1 | 588009368545984617 | "193641" | "Sb+t" | 42 | 332 | 1.0 | "S" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… | 1 |
| 3 | 587741723357282317 | "158501" | "Sc+t" | 28 | 218 | 0.766954 | "S" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… | 1 |
| 10 | 587745403080146952 | "187749" | "Er" | 51 | 191 | 0.340943 | "E" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… | 1 |
| 24 | 588295841247592461 | "238289" | "Er" | 33 | 110 | 0.349646 | "E" | "/kaggle/input/… | "/kaggle/input/… | "/kaggle/input/… | 1 |
plot_df = pred_df.group_by(['target', 'correct']).agg(pl.mean('total_votes'))
fig = go.Figure()
for i, c in enumerate(["Bad classication", "Correct classification"]):
d = plot_df.filter(pl.col('correct') == i).sort('target')
x = d['target'].to_numpy()
y = d['total_votes'].to_numpy()
fig.add_traces(go.Bar(x=x,
y=y,
name=c
))
fig.update_layout(barmode='group',
title='Mean vots per target and classification success',
xaxis=dict(title='Category'),
yaxis=dict(title='Mean votes'),
)
fig.show()
We can see that for the target with the best prediction (E) has the same mean values of votes for good and bad predictions. However this is not the case for the categories S and SB where the correct classifications have a higher mean vote rate. This may suggest that in our dataset, for correct classification on these galaxies we need more votes to find an agreement on the type of galaxies.
In this project, we went into several steps: EDA, data preparation, model training and comparisons, model selection and model evaluation. Let's resume in a few words each of these steps.
EDA
The eda consisted in the following steps:
The knowledge acquired during this step was usefull for the next steps
Data preparation In this step, we moved the files (only with symlinks) to folders in order to read them in batch with tensorflow. We applyed downsampling to use only the number of image available in the category with the less images.
Models The functions declared in this step allowed to run many training with several parameters sets. Two different architectures were build:
We also introduced data augmentation, which increased the accuracy of every models.
Model selection
Finally we selected one model of type Xception and tried to tune the dropout. We end up with a model with an accuracy of 75% which is a decent results given the quality of the ground truth.
Overall the quality of the classification is good, but could be enhanced.
The difficulty of this project was the quality of the ground truth, which in this case may be wrong. It coulb be interesting to make a unsupervised classification to see the difference between the machine classification and the human classification. Other models may be tested as well.